It can be an alternative approach to the frequentist approach
Frequentist approach to linear regression
y=β0​+β1​×X1​+...+ϵ
Minimize the error between modelled and observed y to find the best coefficients
If we are minimizing the residual sum of squares (RSS), there is a closed form solution to find the β^​ and this method is the ordinary least squares
Sidenote: MSE = mean(RSS) and RMSE = sqrt(mean(RSS))
Bayesian approach to linear regression
y∼N(βTX,σ2I) for a normal distribution
The response variable is not a single value but drawn from a probability distribution
The distribution is described by its mean (which is the product of transposed weight matrix and the parameter matrix) and variance (which is the product of the square of standard deviation and the Identity matrix)
The aim is to find the posterior distribution for the model parameters using Bayes Theorem
Posterior = (Likelihood * Prior / Normalization)
P(β∣y,X)=P(y∣X)P(y∣β,X)×P(β∣X)​
Advantages to frequentist approach
Priors can contain domain knowledge beyond the observed data
Posterior is a distribution which allows for uncertainty/confidence analysis
Implementation
Specify priors for the model parameters
Create a model mapping of training inputs to training outputs
Draw samples for the posterior in order to approximate the posterior distribution through MCMC methods